Sideways Transliteration: How to Transliterate Multicultural Person Names?

ثبت نشده
چکیده

In a global setting, texts contain transliterated names from many cultural origins. Correct transliteration depends not only on target and source languages but also, on the source language of the name. We introduce a novel methodology for transliteration of names originating in different languages using only monolingual resources. Our method is based on a step of noisy transliteration and then ranking of the results based on origin specific letter models. The transliteration table used for noisy generation is learned in an unsupervised manner for each possible origin language. We present a solution for gathering monolingual training data used by our method by mining of social media sites such as Facebook and Wikipedia. We present results in the context of transliterating from English to Hebrew and provide an online web service for transliteration from English to Hebrew.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transliteration of Named Entity: Bengali and English as Case Study

This paper presents a modified joint-source channel model that is used to transliterate a Named Entity (NE) of the source language to the target language and vice-versa. As a case study, Bengali and English have been chosen as the possible source and target language pair. A number of alternatives to the modified joint-source channel model have been considered also. The Bengali NE is divided int...

متن کامل

A Modified Joint Source-Channel Model for Transliteration

Most machine transliteration systems transliterate out of vocabulary (OOV) words through intermediate phonemic mapping. A framework has been presented that allows direct orthographical mapping between two languages that are of different origins employing different alphabet sets. A modified joint source–channel model along with a number of alternatives have been proposed. Aligned transliteration...

متن کامل

Chinese-to-English Backward Machine Transliteration

It is challenging to transliterate named entities across languages. It is even more challenging to backward transliterate the transliterated term into its original form. This paper addresses the problem of backward translating person name from Chinese to its English counterpart. We propose a statistical backward transliteration method. Our method uses English sub-syllable and Chinese syllable a...

متن کامل

KUNLP System for NTCIR-3 English-Korean Cross-Language Information Retrieval

This paper describes KUNLP system for the English-Korean cross-language information retrieval track in NTCIR-3 workshop and some experiments after the workshop. Query translation method based on the bilingual dictionary and the document language corpus was used. To automatically transliterate some proper nouns such as Korean person names, Korean place names, and Korean company names, we have co...

متن کامل

Using Transliteration of Proper Names from Arabic to Latin Script to Improve English-Arabic Word Alignment

Bilingual lexicons of proper names play a vital role in machine translation and cross-language information retrieval. Word alignment approaches are generally used to construct bilingual lexicons automatically from parallel corpora. Aligning proper names is a task particularly difficult when the source and target languages of the parallel corpus do not share a same written script. We present in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011